Order-N implementation of exact exchange in extended systems 
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Exact (Hartree Fock) exchange is needed to overcome some of the limitations of local and semilocal 
approximations of density functional theory (DFT). So far, however, computational cost has lim- 
ited the use of exact exchange in plane wave calculations for extended systems. We show that this 
difficulty can be overcome by performing a unitary transformation from Bloch to Maximally Local- 
ized Wannier functions in combination with an efficient technique to compute real space Coulomb 
integrals. The resulting scheme scales linearly with system size. We validate the scheme with 
representative applications. 
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Electronic structure calculations based on density 
functional theory (DFT) have been very successful in 
studies of molecular and condensed matter systems. To 
date most DFT applications to extended material sys- 
tems have used the local density approximation (LDA) 
or the semi-local generalized gradient approximation 
(GGA) for exchange and correlation 1 . These approxi- 
mations are numerically efficient but suffer from serious 
drawbacks. In particular, the spurious self-interaction of 
each electron with itself, occurring with local and semi- 
local functionals, may lead to a poor description of tightly 
bound electronic states^. 

These deficiencies are less severe when hybrid func- 
tional approximations for exchange and correlation are 
adoptedi^ In this approach some exact exchange energy 
is mixed into the exchange-correlation energy functional. 
Extensive applications to molecular systems have shown 
that hybrid functionals are generally superior to GGA in 
the description of structural and electronic properties^. 
Applications to extended systems have been so far rela- 
tively scarce, even though available studies suggest that 
hybrid functionals should provide a better description of 
the electronic properties of insulating materials i^ii 

The main reason for the lack of applications of hybrid 
functionals to extended systems is the considerable com- 
putational cost of evaluating the exact exchange energy, 
particularly within the plane wave-pseudopotential ap- 
proach that is most frequently used for electronic struc- 
ture calculations. This has limited most applications to 
systems with a small unit cell. When large supercells are 
needed, like e.g. in ab initio molecular dynamics (AIMD) 
simulations^, a screened exchange approximation* is of- 
fen used to alleviate the computational burden of hybrid 
functional*!^. 

In this work, we present an accurate and efficient 
scheme to compute the exact exchange energy and po- 
tential for large molecules and extended insulating sys- 
tems. Our scheme can be easily included in existing plane 
wave codes and has computational cost that scales lin- 
early with system size. The approach is based on a uni- 
tary transformation of the occupied subspace from Bloch 
to (maximally) localized Wannier functions (MLWFs)ii. 
MLWFs are exponentially localized and, since the ex- 



change between two orbitals is restricted to the spa- 
tial region of orbital overlap, the amplitude of the ex- 
change interaction between two MLWFs decays rapidly 
with the distance between their centers. Thus, typically 
each Wannier orbital exchanges only with a finite num- 
ber of neighboring orbitals and the number of pair inter- 
actions per orbital is independent of system sizoi^. As a 
result, our procedure to compute exact exchange is order- 
ly, i.e. its computational cost scales linearly with system 
size. We demonstrate the effectiveness of our approach 
in two representative applications using the PBEOH hy- 
brid functional for exchange and correlation. In one we 
perform a ground state electronic and structural opti- 
mization for crystalline silicon, in the other we perform a 
finite temperature AIMD simulation for the same system. 

In the following we assume, for simplicity, a closed-shell 
system with N/2 doubly occupied one-electron states. 
Extension to spin-polarized systems is straightforward. 
The PBEQA2. total energy functional can be written as: 
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where n(r) = 2 Y^hIi \Vi( r )\ 2 i s the electronic density, 
TV is the total number of electrons, and the (pi are the 
occupied one-electron orbitals, and atomic unit (a.u.: 
Ti = m = e 2 — 1) are adopted. As in standard DFT for- 
mulations using LDA or GGA functionals, the first four 
terms in Eq. (TT]) represent the electronic kinetic energy, 
the potential energy of the electrons in the field of the nu- 
clei, the average electrostatic interaction among the elec- 
trons and the electrostatic repulsion between the nuclei, 
respectively. Here we adopt a pseudopotential formula- 
tion. Thus the sums extend to the valence states only 
while n(r) and tpi denote pseudo-density and pseudo- 
wavefunctions, respectively . The last term on the right 
hand side of Eq. ([T]) is the PBEO exchange correlation 
energy^, E^ E0 given by: 
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Here E x denotes exact exchange, E^ BE is the PBE ex- 
change, and E^ BE is the PBE correlation functional 14 . 
The exact exchange energy E x has the Hartree-Fock ex- 
pression in terms of the one-electron (pseudo-)orbitals: 



E x = —2 
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The ground state energy is obtained by minimizing the 
energy functional, Eq. ([1]), with respect to the occupied 
orbitals. This leads to the one-particle equations: 



iv 2 + Mo„(r) + ^(r) + ^J BE (r) + K PBE 
+ - J 14(r,r')^(r')dr' =e^(r), 
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where Vk(r) and V on (r) are the Hartree and the ionic 
(pseudo-)potentials, respectively. V x BE (r) and V c PBE (r), 
the PBE exchange and correlation potentials, depend on 
the electron density and its gradient at position r. The 
exact exchange potential V x (r, r') is the non-local integral 
operator of Hartree-Fock theory. It is given by: 



V x (r,r>) = -2J2 



(r) 



(5) 



We notice that the above procedure is not strictly a 
Kohn-Sham scheme. The latter would require an ex- 
change potential given by the functional derivative of 
the exchange energy with respect to the electron den- 
sity rather than with respect to the orbitals. Since the 
explicit functional dependence of the exact exchange en- 
ergy on the density is not known, implementation of a 
strict Kohn-Sham scheme would require a special pro- 
cedure such as e.g. the Optimized Effective Potential 
(OEP) method^. The latter would be considerably more 
computationally expensive than our approach while giv- 
ing essentially the same ground state energies^. 

The action of V x (r, r') on the orbital (fi in Eq. ^ is 
an orbital dependent term D x {v) given by: 
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Eq. © shows that D x (r) includes the exchange interac- 
tions of the orbital (pi with all the occupied orbitals tpj 
(including the self-interaction). Usually in extended sys- 
tem implementations^, each pair interaction in Eq. ([6]) 
is evaluated in reciprocal space taking advantage of the 
convolution theorem 1 ^ 
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where pij(G) is the Fourier Transform of pij{v) = 
(pi(r)tpj(r). This can be calculated using the Fast Fourier 
Transform (FFT) algorithm at a cost proportional to 
-/Vfft^(-^Vfft), where TVfft is the size of the plane wave 



grid. Thus, if the functions {ipi} are delocalized through- 
out the entire supercell, evaluating Eq. ([7]) for all orbital 
pairs would result in an overall computational effort pro- 
portional to TV 2 x TVfft^(^Vfft)- Neglecting the weak 
logarithmic dependence, this amounts to cubic scaling 
with size. While plane-wave LDA or GGA calculations 
have cubic scaling with size, they only require a number 
of FFTs that scales linearly with N. The need to per- 
form a number of FFTs that scales quadratically with N 
is what makes traditional plane wave implementations of 
the hybrid functional method very expensive. 

Instead of evaluating the exact exchange in terms of 
delocalized Bloch orbitals {fi}, we choose to work with 
MLWFs {<Pi}- This requires a unitary transformation of 
the occupied subspace, (fi — Uijipj, which leaves 

the ground state energy invariant. In terms of the ML- 
WFs D l x (r) becomes: 



Dt(r) = -2(«a(r)ft(r) + ^i)y(r)^(r) 
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where the self-interaction (vu) and the pair-exchange 
(vij) potentials satisfy the Poisson equations: 
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Here, Pij{v) = !pi(r)(p*-(r). In the hybrid functional for- 
malism the contribution associated to vu(r) in Eq. ([8]) 
partially cancels the spurious self-interaction present in 
the Hartree potential Vu (r) in Eq. ^ . The contribution 
associated to the pair potential ^(r) in Eq. ((5|) gives 
the exchange interaction for two electrons of equal spin 
residing in different orbitals. The potential i?y (when i is 
either equal to or different from j) can be viewed as the 
electrostatic potential generated by the charge distribu- 
tion Pij(r). 

In the Wannier representation it is convenient to work 
in real space. This point is illustrated in Fig. [TJ Since the 
exchange interaction is only present in the region where 
two orbitals overlap, i.e. where pij ^ 0, the pair poten- 
tial (r) is conveniently calculated by solving the corre- 
sponding Eq. §§§ in a spatial region significantly smaller 
than the simulation cell. Moreover only a small subset of 
orbitals tpj contribute to the exchange interaction with a 
tagged orbital tpi. 

We have implemented the above method in the CP 
code of the Quantum-Espresso package^. In the follow- 
ing, we apply our approach to compute the electronic 
ground state, to optimize the cell parameter, and to carry 
out an AIMD simulation for crystalline Si in the diamond 
structure using the PBEO functional. In these calcula- 
tions we used supercells ranging from 64 to 216 atoms. 
In all the calculations we used a PBE norm-conserving 
pseudopotential with (3s3p) valence. The plane-wave 
energy cutoff was 15 Ry and we sampled the Brillouin 
Zone at the k = point (r point). For comparison we 
also performed PBEO calculations with the same pseu- 
dopotential and plane- wave cutoff using the conventional 
reciprocal space method to calculate exact exchange as 
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implemented in the PWSCF code of Quantum-Espresso. 
These calculations were performed on the Si 2-atom unit 
cell using a large set of k points to sample the Brillouin 
Zone. 




FIG. 1: (Color online.) Overlap between a tagged Wannier or- 
bital (red) and its nearest neighboring Wannier orbitals (blue) 
in the 64-atom Si supercell. Si atoms are denoted by the green 
spheres. 

In our approach, we first perform a ground state calcu- 
lation using the semi-local PBE functional. Then, given 
the PBE Kohn-Sham eigenstates {ipi(G)} we construct 
the corresponding MLWFs {^(G)} (in reciprocal space) 
by iteratively minimizing the spread functional^?. The 
corresponding MLWFs in real space, {<^(r)}, are ob- 
tained by FFT and are represented on a uniform real 
space mesh. In the Si diamond structure, each MLWF 
is centered in the midpoint between two adjacent atoms, 
and overlaps significantly with the six nearest neighbor- 
ing orbitals, as shown in Fig.[TJ 

Since the density Pij(r) is known for each pair of or- 
bitals, we can associate with each pair of orbitals an or- 
thorhombic box with sides (l%,ly,l z z ) such that outside 
this box Pij(r) is smaller than a given cut-off value p cut , 
which we take equal to 2 x 10~ 4 bohr~ 3 in the present 
work. We then solve Eq. © inside the box. Notice that 
the box contains a greatly reduced set of grid points com- 
pared to the simulation cell. For example, in our 64-atom 
Si calculation the real space grid needed to compute the 
pair potential Vij generated by two adjacent orbitals con- 
tains only 20% of the mesh points of the simulation cell. 
Even fewer points are needed to compute the pair poten- 
tial Vij(r) generated by more distant orbitals. Since the 
density pij (r) is vanishingly small when the distance be- 
tween the orbitals i and j is sufficiently large, many pair 
interactions are negligibly small. We find that in our 
64-atom Si supercell each orbital exchanges appreciably 
only with 30 orbitals out of the set of 127 neighboring 
orbitals. 

To solve the Poisson equation the Laplace operator V 2 
is discretized on 7 mesh points. The resulting finite dif- 
ference equation has the form of a linear matrix equation 
of the type Ax = b. The symmetric and positive definite 
square matrix A is sparse and has dimension n, where 



n is the number of mesh points inside the reduced box. 
The vector x corresponds to the unknown My(r), and the 
(known) vector b corresponds to the pair density py (r). 
The values of Wy(r) at the boundary of the box are set 
by the multipole expansion: 

uy(r) = ^^^jTj-gfa, rl+1 (10) 

l.m 

where the multipoles qi m are given by the integrals: 




In Eq. (fl"Tj) the Yi m are spherical harmonics referred to 
the center of the pair density, which we define by ■ = 
/ rpij (r) / J p^ (r) . We found that inclusion of multipoles 
up to I = 6 is sufficient to achieve good accuracy. 

Solving the linearized Poisson equation Ax = b is 
equivalent to finding the vector x that minimizes the 
function /(x) = |x T Ax-b T x+c, where c is an arbitrary 
constant. This minimization is efficiently performed with 
the conjugate gradient (CG) method^. We terminate 
the CG iteration when the residue in the calculation of 
Vij(r) is everywhere smaller than 10~ 5 a.u.. In order to 
calculate the D % x in Eq. (jHJ) we need to evaluate the prod- 
ucts Vij(r)(pj(r) in the region where |£>j(r)| 2 > p cut . This 
region may include points outside the box associated to 
the pair density Py(r) but values of Vij(r) outside that 
box are easily obtained from the multipole expansion in 
Eq. (HUD- 

Having calculated the D x (r), the PBEO ground state 
is obtained by conventional electronic structure meth- 
ods. Here we optimize the electronic degrees of freedom 
via damped second order Car-Parrinello dynamics^^ in 
which the "force" acting on the orbitals, -ff PBE0 </?i(r), in- 
cludes the additional D x (r) terms to account for exact 
exchange. Finally, the exchange energy E x is given by 
the sum of the energies of the orbital pairs in presence of 
the corresponding pair potential Uy(r), 

E x = -2^ J ft(r)&(r)v y (r)dr. (12) 

y 

The exchange energy in equation (Eq. (JT^J) ) can be 
viewed as a sum of orbital contributions e x (i) : E x — 
^2i^x(i)- The i — th orbital contribution e x (i) can be 
further decomposed into self-exchange e self (i) = J tpjvu 
and pair-exchange e pair (i) = J2j^if ¥>i<Pj v ij- 

In Table U we report the calculated exchange energy 
per orbital, in crystalline Si using a 64-atom supercell. In 
this system the MLWFs are all equivalent by symmetry, 
i.e the orbital index in e x (i) can be dropped. Moreover 
the MLWF centers coincide with the bond centers and 
it is convenient to group the pair exchange contributions 
into contributions originating from the different shells of 
neighbors of a bond center. The Table lists the shell 
index / (which is for the central site, 1 for the first 
shell of neighbors, etc.), the corresponding shell radius 
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TABLE I: Contributions to the exchange energy e x (in a.u.) 
from shells of neighbors. R(I) is shell radius (bohr) and N(I) 
is the coordination number of shell /. The experimental lat- 
tice constant ao = 5.43A is used. 
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R(I) 





3.63 


6.28 


7.26 


8.11 


N(I) 


1 


6 


12 


12 


12 




-0.465 


-0.059 


-0.002 


-0.007 


-0.0001 



R(I), the corresponding coordination number N(I), and 
the corresponding exchange energy contribution e x (I), 
with e x = J2i ex(-0- 

It is evident that the largest contribution to e x comes 
from the self-interaction e x (0), and that the exchange 
contributions of the neighboring shells, e x (I), with / = 
1,2, .., goes rapidly to zero with increasing shell radius. 
As a matter of fact the exchange energy contribution of 
the fourth shell is only 1/300 — th of the contribution due 
the first shell of neighbors. 

TABLE II: Comparison of our real space method and the 
reciprocal space method implemented in PWSCF. E denotes 
total (pseudo-) energy per atom (Rd) and VBW is the valence 
band witdth (eV). 
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64 216 
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-7.865 -7.870 


-7.867 


-7.873 


VBW 


13.3 13.3 


13.3 


13.3 



In table HH we report the calculated PBE0 ground- 
state energy using two supercells, one with 64 atoms and 
one with 216 atoms. The results of the two calculations 
are compared to the results obtained with the conven- 
tional reciprocal space method using a 2-atom unit cell. 
In the case of the two large supercells we used T point 
sampling, while we used two large sets of k points in the 
conventional calculations as indicated in the Table. The 
two sets of calculations are in very close agreement: the 
valence band widths are the same, while the slight differ- 
ences in total energy can be attributed to the differences 
in the fc-point sampling. 



TABLE III: Lattice constant ao(bohr) and bulk modulus 
Bo(GPa) of a Si. 
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As a further comparison we report in Table III the 
equilibrium lattice constant ao and the bulk modulus Bq 
calculated with a 64-atom Si supercell. We also report 



in the same table the results of a conventional calcula- 
tion with a 2-atom unit cell and a 4x4x4 fc-point grid. 
Again, the results of the two calculations are in excellent 
agreement. 
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FIG. 2: Computational cost of exact exchange per iteration 
of AIMD dynamics with different supercells. The computa- 
tional cost is given by the time (s) necessary to compute exact 
exchange on a 16-CPU 3.2 Ghz Intel Xeon computer cluster. 

In our approach, the computational cost of an exact 
exchange calculation depends on the number of pair ex- 
changes that need to be included to achieve a desired 
accuracy. Since each orbital has exchange only with a 
finite number of neighboring orbitals independently of 
the system size, the computational effort of the exact ex- 
change calculation should scale linearly with system size. 
Fig. [2] shows that this is indeed the case. 

Finally, we demonstrate that our approach makes 
AIMD simulations with hybrid functionals, such as 
PBE0, feasible at a modest computational cost. In AIMD 
simulations a large number of time steps, typically tens 
of thousands, are necessary to obtain statistically mean- 
ingful results. As a consequence AIMD simulations with 
hybrid functionals are very challenging and so far have 
only been performed by making some approximation, like 
the screened exchange approximation, in the calculation 
of the exchange integrals^. In our approach we do not 
need to modify the Coulomb potential to eliminate ex- 
change interactions at large distance. These are automat- 
ically truncated by the exponential decay of the MLWFs 
and all the relevant pair exchange interactions are in- 
cluded. To show the feasibility of AIMD simulations we 
tested our approach in a finite temperature simulation 
of a Si sample with 64 atoms in a simple cubic super- 
cell geometry. The simulation was initiated by randomly 
displacing the atoms from their crystalline sites while 
their velocities were set to zero. The subsequent trajec- 
tories were obtained by numerically integrating the Car- 
Parrinello equations of motion with the standard Verlet 
algorithm^!. MLWF-based AIMD trajectories were gen- 
erated as described in Ref. 17, using the PBE0 total 
energy functional E PBE0 to compute the forces on elec- 
tronic and ionic degrees of freedom. 
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FIG. 3: (Color online.) Potential energy per atom E FBE0 (red 
line), kinetic energy per atom K (black line) and internal 
energy per atom U (blue line) vs. AIMD time steps. The 
average U sets the zero of the energy scale. A time step A = 
8 a.u. and a fictitious electronic mass of 800 a.u. were used. 

We plot in fig. [3] the time variation along a nuclear 
trajectory of E PBE0 , i.e the potential energy of the 
ions (nuclei plus core electrons), of their kinetic energy 

K = V^Ej^'^/i an( i °f the ionic internal energy 
U = K + E E0 . The internal energy is an exact con- 
stant of motion of classical nuclear dynamics but is only 
approximately constant in Car-Parrinello simulations due 
to the fictitious dynamics of the electrons. Fig. [3] shows 
that indeed U is approximately constant with minor fluc- 
tuations and no drift over the time scale of the simula- 
tion. This is the typical behavior observed in standard 
simulations of insulating systems based on LDA or GGA 
functionals. We conclude that our real space treatment of 
exact exchange does not lead to any appreciable degrada- 
tion of the quality of the integrated trajectories compared 
to standard AIMD simulations. 



The AIMD trajectory reported in fig. [3] was obtained 
on a 16 CPUs PC cluster and took 34 s of real time 
per time step. For comparison a standard GGA simula- 
tion for the same system would take only 2.5 s per time 
step on the same computational platform. This example 
shows that while hybrid functional calculations remain 
more expensive than GGA calculations, AIMD trajecto- 
ries lasting for many ps are possible with access to mod- 
erate computer resources. Moreover the order-N cost of 
the exact exchange calculation means that the overhead 
of hybrid functional calculations should be a compara- 
tively smaller fraction of the overall computational cost 
in simulations on bigger systems. 

In conclusion we have developed an order-N method to 
compute exact exchange in extended insulating systems. 
By exploring the locality of maximally localized Wannicr 
functions, we calculate the orbital dependent exchange 
potential and the corresponding exchange energy con- 
tribution directly in real space. The approach is suffi- 
ciently efficient to make AIMD simulations with hybrid 
functionals possible and can be effectively implemented 
on parallel computer platforms. Its computational effi- 
ciency should be even better for large band-gap systems 
such as e.g. water, where the MLWFs are more localized 
than in silicon. Since exact exchange is a basic ingredi- 
ent in many-body approaches to electronic excitations, 
such as e.g. the GW scheme^, our approach should fa- 
cilitate the application of these schemes to systems re- 
quiring large supercells, such as liquids and disordered 
systems in general^. 
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